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Abstract. The purpose of this experiment was to implement and evaluate the effectiveness of a next-generation sequencing- 
based method for DNA methylation analysis in porcine embryonic samples. Fourteen discrete genomic regions were amplified 
by PCR using bisulfite -converted genomic DNA derived Irom day 14 in v/vo-derived (IVV) and parthenogenetic (PA) porcine 
embryos as template DNA. Resulting PCR products were subjected to high-throughput sequencing using the Illumina Genome 
Analyzer IIx platform. The average depth of sequencing coverage was 14,611 for IVV and 17,068 for PA. Quantitative 
analysis of the methylation profiles of both input samples for each genomic locus showed distinct differences in methylation 
profiles between IVV and PA samples for six of the target loci, and subtle differences in four loci. It was concluded that high 
throughput sequencing technologies can be effectively apphed to provide a powerful, cost-effective approach to targeted DNA 
methylation analysis of embryonic and other reproductive tissues. 
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In addition to 'traditional' genetic control, epigenetic phenomena 
are being recognized as playing important roles in directing 
cellular function and fate. Epigenetics can be defined as the means 
whereby heritable patterns of gene expression within a population 
of cells are regulated without alterations in the actual nucleotide 
sequence of the genome. Epigenetics, broadly defined, can include 
such processes as post-translational histone modification, chromatin 
remodeling, RNA-mediated gene silencing (RNAi), and DNA and 
mRNA methylation [reviewed in 1 , 2], Of these, DNA methylation is 
the best characterized, and the most extensively studied [see 1, 3 and 
references therein] . DNA methylation involves the covalent addition 
of a methyl group (-CH3) at the C5 position of the cytosine nucleotide 
base. In mammals, this modification is observed primarily at cytosines 
in a CpG context - that is, when a cytosine is immediately followed 
by a guanine in the nucleotide sequence. In mammals, 70-80% of 
CpG dinucleo tides are methylated. Regions of higher-than-expected 
CpG frequency (CpG islands) are associated with most mammalian 
genes, and the relative preponderance of methylated cytosines within 
these CpG islands is generally inversely correlated with transcriptional 
activity of those genes (i.e. high methylation = low gene expression). 
DNA methylation is carefully regulated within the cell, and the 
establishment and maintenance of DNA methylation marks are 
accomplished by a variety of methyltransferase enzymes (DNMT 1 , 
DNMT3A, DNMT3B, e.g.). Precisely coordinated methylation and 
de-methylation of DNA is essential for proper development [4]. 
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Faulty patterns of DNA methylation are associated with myriad 
disease conditions, including many cancers [5-9]. 

In livestock animal production systeins, application of soine 
assisted reproductive technologies (ART) such as in vitro fertilization 
(IVF) and somatic cell nuclear transfer (SCNT) is liinited because of 
relatively poor efficiency. In cattle, establishment of pregnancy after 
transferring einbryos derived from IVF into recipient females can 
be as low as 20% [10, 11]; survival of SCNT einbryos after transfer 
is even lower: 5-10% [reviewed in 12]. Siinilar - though even less 
encouraging - data are observed in pigs [13]. Evidence is mounting 
that embryo and/or gainete inanipulations in vitro can adversely 
impact the delicate constitution of the embryonic epigenoine, and 
that even subtle changes in DNA methylation profiles might provide 
a mechanistic link between in vitro culture techniques and poor 
developinental competence. 

More is known about the role of DNAmethylation in development 
and disease than other epigenetic phenomena, at least in part because 
robust and simple techniques have been developed that allow for 
accurate assessment of CpG methylation status, even at single 
base-pair resolution. The most widely-applied approach to DNA 
methylation analysis is a technique called bisulfite sequencing. When 
exposed to bisulfite ions under the proper conditions, methylated 
cytosines of chroinosoinal DNA remain intact, while unmethylated 
cytosines undergo de-amination - thus converting cytosine to uracil. 
In downstream sequencing analyses of bisulfite-converted DNA, 
uracil residues behave like thymine, instead of the cytosines they 
were originally derived froin. Thus by coinparing DNA sequences 
'before' and 'after' bisulfite conversion, it is possible to ascertain which 
cytosines within a particular sequence were methylated and which 
were uninethylated at the time of analysis. Potential drawbacks to 
traditional bisulfite sequencing include the inconvenience of cloning of 
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bisulfite-converted products intoii. coft' for propagation, the relatively 
low statistical confidence derived from analyses that are typically 
based on data fi-om only 10-20 sequences per sample, and costs that 
accumulate quickly as more samples and/or more genetic sequences 
are targeted for analysis. Alternatives to bisulfite sequencing for 
evaluating DNAmethylation, include Combined Bisulfite Restriction 
Analysis (CoBRA), Methylated DNAImmunoprecipitation (MeDIP), 
and Reduced Representation Bisulfite Sequencing (RRBS), among 
others [reviewed in 14]. These alternative techniques each come with 
significant limitations, including (but not limited to) high cost and/ 
or low resolution of sequence-specific methylation data. 

The vast majority of the published studies regarding DNA methyla- 
tion analysis in mammalian samples utilize low-throughput, traditional 
bisulfite (Sanger) sequencing as outlined above. Recent advances in 
nucleic acid sequencing technologies (so-called 'next-generation', 
'high throughput', or 'deep' sequencing) have made it possible to 
generate millions of base pairs worth of sequence information in a 
matter of hours. Until now, most reports of the utilization of high- 
throughput sequencing technologies for studying DNAmethylation 
detail genome-wide epigenomic analyses [15, 16], which efforts 
can be extremely costly. In addition, because of the large size of 
mammalian genomes, the depth of coverage at each CpG site and the 
associated statistical confidence in the true proportion of methylated 
vs unmethylated cytosines in the initial cell population can be quite 
low. Herein, we report our efforts taking a 'hybrid' approach to DNA 
methylation analysis: using the lUumina GAIIx platform to generate 
millions of bisulfite sequencing reads from a discrete number of 
targeted loci, with each read containing quantitative information 
about the methylation status of our input samples. 

As a small part of our larger effort to understand the mechanisms 
behind early embryonic mortality in swine, we set out to devise simple, 
straightforward, relatively inexpensive, but powerful techniques for 
targeted DNAmethylation analysis in early embryonic samples. Day 
14 embryos were chosen for this particular analysis because the 
physiology of the developing conceptus during the peri -attachment 
window (days 11-18 in pigs) is especially critical to the ultimate 
success or failure of individual embryos growing in utero. It was 
considered that differences in DNA methylation between porcine 
embryos derived from artificial insemination (in vivo; IW) and 
parthenogenetic oocyte activation (R^) could yield insight into the 
epigenetic determinants of successful embryogenesis and the phe- 
nomenon of parent-of-origin genomic imprinting. Genomic imprinting 
is a poorly understood phenomenon - especially in non-traditional 
model species - that involves the preferential transcription of one 
parental allele over the other Differential DNA methylation of the 
maternal and paternal alleles is at the heart of genomic imprinting 
[17, 18]. It has been suggested that imprinted genes are particularly 
susceptible to insult in response to the in vitro manipulations associated 
with ART [19-22]. 

From a previous, small-scale pilot study designed to identify 
epigenetic differences between day 14 IW and PA embryos, CpG 
islands were identified that were putatively differentially methylated 
between embryo types (data not shown). We selected for fiirther study 
14 of these CpG islands that were located either within or proximal 
to characterized gene bodies. For ease of identification, these CpG 
islands are referred to by the name of their nearest gene neighbor 



Table 1. Summary of the PCR products used and the number of lUumina 
reads aligned for each CpG island studied 

PCR product Length (bp) # CpG IVV reads PA reads 



Total: 1,065,550 1,231,798 

PCR Product = The CpG islands queried by bisulfite PCR sequencing are 
referred to by the name of the gene that is in closest proximity. Length 
= the predicted length of the PCR product; bp = base pairs. # CpG = the 
number of potential methylation sites available for analysis within each 
PCR product. IVV = in v/vo-produced embryo; PA = embryo derived 
from parthenogenetic oocyte activation. The values presented in the 
"IVV Reads" and the "PA Reads" columns are the numbers of short 
lUumina sequencing reads that were aligned to reference sequences for 
each CpG island and used for downstream analyses. 



Sequence information and chromosomal coordinates for these CpG 
islands can be found in Suppl Table 1 (on-line only). A high throughput 
sequencing approach was taken to evaluate the methylation status 
of these 14 CpG islands in IW and PA embryos. Genomic DNA 
from a single intact day 14 embryo from each production method 
(IW and PA) was collected and subjected to bisulfite conversion, 
which provides a mark to differentiate unmethylated from methyl- 
ated cytosine bases within the nucleotide sequences evaluated. A 
single PCR amplicon from each of fourteen distinct CpG islands 
was generated using bisulfite-converted DNA from both embryo 
types. The average amplicon length was 359 base pairs (bp; range 
249^85 bp), covering an average of 27.1 CpG dinucleotides per 
amplicon (range 19^0 CpG sites). These bisulfite PCR products 
were subjected to high throughput sequencing using the Genome 
Analyzer IIx platform from lUumina, and the short (80 bp) reads 
produced by the instrument were mapped to the reference sequences 
for each individual amplicon. 

A total of 2,297,348 sequencing reads were aligned, with a slight 
bias overall towards reads from the PA sample. Table 1 provides 
summary statistics regarding the number of reads aligned and used 
for downstream methylation analyses for each CpG island queried. 
In addition to the tendency for more reads from the PA sample, 
the number of aligned reads per amplicon varied greatly within a 
sample. Likewise, the distribution of reads across the amplicons 
was not unifomi. Figure 1 shows representative traces for four PCR 
products, plotting depth of coverage (Y-axis) against nucleotide 
position (X-axis). Read distribution bias was so extreme for one 
amphcon (NDUFBll; not shown) that large gaps in coverage were 
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Fig. 1. Depth of coverage obtained using liigh tliroughput bisulfite sequencing. Representative traces plotting absolute depth of coverage 
(Y-axis) against nucleotide position (X-axis) along the entire length of the amplicon for four of the 14 PCR products sequenced 
and aligned. These four traces are all from IVV samples; the traces from VK samples are virtually identical in shape and differ 
only slightly regarding the depth of coverage at each position. 



observed, and this genomic locus was not evaluated further. While 
the absolute depth-of-coverage values differed between samples, 
the shapes of the read distribution traces for a given PCR product 
were virtually indistinguishable between samples. Notwithstanding 
these peculiarities associated with read generation and alignment, 
the average depth of coverage across all nucleotide positions in all 
amplicons was 14,611 for I Wand 17,068 for PA, with a maximum 
of 77,944 at nucleotide position 55 of the RRMl-IW amplicon and a 
minimum of 7 1 8 at nucleotide position 279 of the FLT 1 -PA amplicon. 

Using a sophisticated single nucleotide polymorphism identification 
algorithm, we were able to precisely determine the proportion of 
cytosines in CpG context that retained their identity after bisulfite 
sequencing (i.e. methylated in original template) relative to those 
that were deaminated by the conversion process (unmethylated in 
original sample). Data from the comparative methylation analysis are 
summarized in Fig. 2. We saw clear differences in overall methylation 
levels (more than 30% difference) between IW and PA embryos 
in six of the thirteen amplicons analyzed, while overall percent 
methylation was not obvious in the remaining seven amplicons. It 
was interesting to note, however, that subtle differences (greater 
than 1%, but less than 5% difference) in overall methylation patterns 
were suggested between sample types for three of the amplicons 
(BMP7, 4.5% difference; LHX4, 2.9% difference; and RUNXITI, 
4.4% difference). 

The physiological significance of these subtle differences is under 
investigation, and is more appropriately debated in another venue. 



The prevailing dogma, though, is that major changes in the overall 
density of methyl-cytosine bases across large regions of chromosomal 
DNA alters the conformation of chromatin, which can impact gene 
expression, and cell physiology. By this definition, these subtle 
differences in DNA methylation between IW and PA embryos would 
not be expected to have any physiological relevance. However, there 
is no clear threshold of DNA inethylation that is required to elicit 
biological change. As little as 10% change in promoter methylation 
can cause phenotypic variation in the viable yellow agouti mouse 
[23], and constantly-einerging evidence suggests that even very 
minor changes in DNA inethylation patterns might have distinct 
effects on cellular and organismal physiology [see references 24—26, 
e.g.]. Thus the importance of being able to confidently detect such 
subtle changes in DNA methylation patterns cannot be overstated. 

Traditional bisulfite sequencing experiments typically report 
methylation statistics for 10-20 DNA strands for each target sequence 
and for each sample tested. Figure 3 depicts the relationship between 
the number of DNA strands queried and the resulting statistical power 
for different magnitudes of discrepancies in percent methylation 
between two samples. While a traditional bisulfite sequencing 
experiment utilizing 20 sequences per amplicon would have sufficient 
statistical power to confidently detect a 50% difference in methylation 
frequency between sample types (as might be expected in classic 
instances of genomic imprinting), more subtle differences (< 10%) 
between samples would not be reliably detected with even 100 
sequences per amplicon (Fig. 3). Herein lies one major benefit of 
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Fig. 2. Comparative analysis of DNA methylation patterns of in vivo- 
produced (IW) and parthenogenetic (PA) porcine embryos. 
The percentages of methylated cytosines at each potential 
methylation site for each of thirteen distinct PCR products are 
depicted. Deep red colors indicate very high percent methylation; 
pale blue indicates veiy low methylation. Locus = gene nearest 
the CpG island queried. 



Fig. 4. Manual methylation analysis of FLTl sequencing reads from 
high throughput and traditional bisulfite sequencing techniques. A) The 
FLT 1 locus queried in these experiments corresponds to a CpG island 
that maps to the 3 ' end (exon 29 and flanking intron sequences) of the 
FLTl gene. The long horizontal line represents the fiiU 864 base-pair 
CpG island. Each vertical line represents a potential methylation site 
within the island. The hashed bar marks the approximate location of 
the fiill PCR amplicon generated for these experiments, while the green 
shaded box shows the relative position of exon 29 of the FLTl coding 
sequence. The CpG sites marked with red lines are those described in 
further detail in this figure (CpG dinucleotides A-H in Panels B and 
C below), which con'espond to CpG dinucleotide numbers 15-22 (in 
reverse order) in Fig. 2. Primer sequences and precise chromosomal 
coordinates of this and other genomic sequences analyzed in these 
experiments can be found in Suppl Table SI (on-line only). B) Short 
lUumina reads coiTesponding to the FLTl locus (n = 54 reads for PA; 
n = 52 reads for IVV) were aligned to the reference sequence and the 
methylation status of each DNA fragment was evaluated. Each row in 
the figure represents a short sequencing read. Each column represents 
a potential methylation site. Methylated CpG sites are indicated by red 
boxes, whereas unmethylated sites are colored blue. For this specific 
segment of the FLTl locus, 82.1% of the DNA fragments analyzed 
were either fully methylated or completely unmethylated, while only 
17.9% of the clones were partially methylated. These data illustrate the 
utility of a manual analysis of short reads for evaluating the distribution 
pattern of methylated vs. unmethylated cytosines. C) Methylation 
analysis by traditional bisulfite sequencing of the same eight CpG 
dinucleotides within the FLTl locus that were evaluated in Panel B. 
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Fig. 3. Statistical power in bisulfite sequencing experiments. The 
statistical power that is available to detect significant differences in 
methylation profiles between two samples varies with the number 
of discrete observations made for each sample and the magnitude 
of the difference in percent methylation between the two 
samples. Estimates of statistical power for varying combinations 
of observations (reads sequenced) and percent methylation 
differences (difference between samples) are presented in this 
figure. A power statistic of 0.8 is generally regarded as a minimal 
value to achieve for appropriate statistical confidence [30]. In 
the figure above, values shaded in grey suggest that for those 
hypothetical experimental parameters, there would be insufficient 
statistical power to confidently reject the null hypothesis (that 
methylation levels are equivalent between samples). Thus for an 
observed 40% difference between samples (i.e. 90% methylated 
vs. 50% methylated), using 20 reads per sample, one might with 
reasonable confidence reject the hypothesis that the two samples 
had identical methylation patterns. But for an observed difference 
of 30%, the statistical power (< 0.8) would be insufficient to 
make that claim. Note that for relatively subtle differences in 
methylation (< 10%), even 100 sequences per sample would 
not provide enough observations for statistically-significant 
separation of samples. N/A = the software used for power statistic 
calculations [29] imposes lower limits of samples sizes and 
proportions used together to ensure numerical stability; below 
these limits, meaningful statistics are not generated. 

this high throughput bisulfite sequencing approach to targeted DNA 
methylation analysis: given the enormous depth of coverage at each 
potential methylation site, confidence is extremely high that any 
differences observed are truly reflective of the overall methylation 
status of the input material, and not an artifact of statistical sampling. 

It should be noted that a comprehensive analysis of the short 
read sequencing and alignment data generated here provides only a 
summary of the average percent methylation at each locus for each 
sample. Thus, it is not immediately apparent whether a particular 
locus that demonstrates 50% overall methylation is approximately 
50% methylated in all DNA fragments coming from that sample, or 
whether half the fragments are highly methylated (-100%) and the 
other half very lowly methylated (~0%). These data stand in contrast 
with those obtained from traditional bisulfite sequencing approaches 
where the complete methylation profile for individual cloned DNA 
fragments can be known. In some experimental schemes, being able 
to distinguish between these scenarios could be very important - as 
in the study of imprinted genes. Despite the high throughput nature 
of the approach described here, it is still very possible to evaluate 



a small number of randomly selected reads such that the nature of 
the distribution of methylated cytosines can be assessed. A summary 
of one such evaluation is presented in Fig. 4, where it was noted 
that at the FLTl locus, cytosine methylation was distributed such 
that the majority of the reads were either completely methylated 
or completely unmethylated. The utilization of longer sequencing 
reads (80-100 bp, as described herein) facilitates such analyses, by 
allowing the evaluation of as many potential methylation sites as 
possible for each read. 

We perfomied traditional bisulfite sequencing to confirm that the 
results generated using the lUumina platform would be validated by 
a more time-tested approach. A summary of the work flow for each 
of the two sequencing approaches (traditional Sanger sequencing 
vs. high throughput lUumina sequencing) is presented in Fig. 5, and 
the results from the two methods are summarized in Fig. 6. The 
correlation coefficient (i?) of the methylation statistics between the 
two techniques was 0.95, indicating very high agreement between 
the two analysis methods. We sumiise that the small differences 
observed between methods arise because of sampling errors that 
arise due to the relatively small number of sequences analyzed using 
traditional bisulfite sequencing techniques. 

Impoitantly, the costs associated with this high throughput bisulfite 
sequencing approach were not any higher than for a traditional bisulfite 
sequencing approach. For the current experiment, we sequenced 
fourteen amplicons from each of two samples at an average depth of 
coverage of 15,839 for a cost of approximately $1200 (US dollars). 
Sequencing these same fourteen amplicons from these same two 
samples at a depth of coverage of 20 using a traditional bisulfite 
sequencing approach would cost approximately $ 1 800 (US dollars), 
not including the cost of PCR product subcloning, plasmid minipreps, 
etc. Depending on the numbers of sample/amplicon combinations 
queried, a next-generation sequencing approach can be significantly 
more cost-effective than a traditional approach. The sequencing depth 
that was attained in the current experiments provided many more 
observations at each potential methylation site than were needed to 
accurately ascertain its methylation status. This suggests that even 
more samples and/or more genomic loci could be queried in a similar 
experiment without compromising effectiveness or significantly 
increasing the cost. 

In conclusion, the single end, short read lUumina sequencing 
platform can be effectively implemented to address specific ques- 
tions regarding the epigenetic constitution of embryonic and other 
reproductive samples. Ultra-deep bisulfite sequencing can dramatically 
boost statistical confidence in targeted DNA methylation analyses 
as compared to traditional bisulfite sequencing. These techniques, 
as reported here, should be more widely considered as powerful and 
cost-effective means for evaluating patterns of DNA methylation in 
reproductive research. 

Methods 

Collection of embryonic tissue 

Embryos derived from parthenogenetic activation of in vitro 
matured porcine oocytes were produced, transferred into recipient 
females, and harvested on day 1 4 of gestation as described elsewhere 
[27]. The IW embryos were produced by artificial insemination 
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Fig. 5. Workflow summaries for traditional and high throughput bisulfite 
sequencing. Summarized protocols for traditional and high 
throughput DNA methylation analysis by bisulfite sequencing are 
presented. gDNA = genomic DNA; bsPCR = bisulfite PCR. 



according to standard industry practices, and were harvested by uterine 
flushing on day 14 of gestation as well. Upon collection, embryos 
were snap frozen in liquid nitrogen and stored at -80 C until use. 

Genomic DNA preparation and bisulfite PCR 

Genomic DNA was isolated from the IVV and PA embryos 
using the AllPrep DNA/RNA Micro kit from Qiagen (Valencia, 
CA, USA), and was stored at -20 C until use. Genomic DNA (500 
ng) from both embryo types was subjected to bisulfite conversion 
using the EZ DNA Methylation-Gold kit from Zymo (Irvine, CA, 
USA), according to the manufacturer's directions. The EpiDesigner 
software from Sequenoin was used to design primers for bisulfite 
PCR (Suppl Table SI: on-line only). The HotStarTaq Plus Master 
Mix kit froin Qiagen was utilized to generate PCR amplicons froin 
bisulfite-converted genomic DNA. PCR reaction mixes included 20.0 
10,1 2x MasterMix, 10.0 |a,l water, 4.0 |a,l CoraLoad loading reagent. 



□ Traditional 




Fig. 6. DNA methylation analysis by traditional versus high throughput 
bisulfite sequencing. Percent methylation of four target sequences 
in in vn'o-derived (IVV) and parthenogenetic (PA) embryos, as 
determined by traditional bisulfite sequencing (open bars) or 
Illumina-based deep sequencing (black bars). Genomic sequences 
and sample types are presented across the X-axis, while overall 
percent methylation for each sample/amplicon combination 
is presented on the Y-axis. The con'elation coefficient (R) for 
percent methylation between analysis platforms for all sample/ 
amplicon combinations is 0.95, suggesting a strong positive 
correlation between datapoints collected using the two distinct 
methods. Values presented are overall percent methylated (# 
methyl-C/# total CpG) ± standard error of the mean (SEM). The 
high throughput analysis approach utilized for the lllumina data 
does not lend itself to a calculation of a true error statistic based 
on all individual reads. Therefore, the error terms presented for 
these data points are estimates based on the standard deviation 
values calculated for ~300 randomly selected short reads for 
each sample/locus combination, and using the average depth of 
sequencing coverage for each amplicon as the value for "n" when 
calculating the SEM. 



2.0 |j,l bisulfite-converted gDNA (at 20 ng/|a,l), and 4.0 |a,l forward/ 
reverse primer mix (final concentration of 1 |jM each). Samples were 
then subjected to thennal cycling for target amplification. Cycling 
parameters included a 5 min initial enzyme activation cycle followed 
by 40 cycles of 95 C/15 sec - 55 C/30 sec - 72 C/30 sec and then 
a single 5 inin extension termination step of 72 C. Resulting PCR 
products were purified using the QI Aquick PCR Ptirification kit froin 
Qiagen and pooled by sample type in equimolar concentrations, then 
sheared to approxiinately 100-1 50 bp fragments using the BioRuptor 
from Diagenode (Liege, Belgium). Sequencing library preparation 
proceeded according to standard single-end sequencing protocols 
provided by lllumina, and both final libraries (IVY and PA) were 
loaded onto a single lane of the Genome Analyzer IIx flow cell at 
a final coinbined concentration of 7 pM. Sequencing consisted of 
2x40 bp cycles, resulting in a final read length of 80 bp. 

Sequencing read alignment and methylation analysis 

Sequencing reads were aligned to reference sequences using 
the bisulfite sequencing algorithm of the Novoalign software from 
Novocraft (Selangor, Malaysia). Aligned reads were uploaded into 
Geneious, a sequence manipulation software product from Biomatters 
(Auckland, New Zealand), and the SNP detection algorithin was used 
to determine the frequencies of inethylated versus uninethylated 
cytosines in CpG context in the distinct sainples for each PCR 
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product. To compare with the results of traditional sequencing (see 
below), the methylation status of the FLTl locus was determined 
by selecting short reads that ahgned perfectly to the target sequence 
(n = 54 reads for PA; n = 52 reads for IW), and evaluating the 
methylation status of each read using BISMA, a web-based DNA 
methylation analysis platform [28]. 

Traditional E. coli-based bisulfite sequencing 

Traditional bisulfite sequencing to validate the deep sequencing 
data was performed by subcloning four of the same PCR products 
used for Illumina sequencing (BCOR, CDC42BPB, CDX2, FLTl) 
in to the pGEM T-Easy vector from Promega (Madison, WI, USA), 
transforming subcloned plasmids into E. cloni lOG chemically- 
competent bacterial cells (Lucigen; Middleton, WI, USA), and 
sequencing the resulting plasmid preparations. A minimum of seven 
unique plasmid preparations were sequenced for each sample/amplicon 
combination. Methylation frequencies of subcloned PCR products 
were determined using BISMA as described above. 

Power analysis calculations 

The power analysis calculations were performed using software 
written and made freely available online by Dr Russ V. Lenth at 
the University of Iowa [29]. Repeated power tests to compare two 
proportions were perfomied, changing sample sizes and expected 
differences between samples with each iteration. Alpha (a) at 0.05 and 
equal sample sizes per treatment group were set for each pemutation. 
The null hypothesis to be tested in each instance was that the two 
proportions (% methylated) were equal (pi = p2). For each set of 
hypothetical experimental parameters, a power statistic was generated 
and recorded to create the table of statistical power in Fig. 3. 
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